智能论文笔记

Learning To Segment Dominant Object Motion From Watching Videos

Sahir Shrestha , Mohammad Ali Armin , Hongdong Li , Nick Barnes

分类：计算机视觉

2021-11-28

现有的基于深度学习的无监督视频对象分割方法仍依靠地面真实的细分面具来训练。在这种情况下令人未知的意味着在推理期间没有使用注释帧。由于获得真实图像场景的地面真实的细分掩码是一种艰苦的任务，我们想到了一个简单的框架，即占主导地位的移动对象分割，既不需要注释数据训练，也不依赖于显着的电视或预先训练的光流程图。灵感来自分层图像表示，我们根据仿射参数运动引入对像素区域进行分组的技术。这使我们的网络能够仅使用RGB图像对为培训和推理的输入来学习主要前景对象的分割。我们使用新的MOVERCARS DataSet为这项新颖任务建立了基线，并对最近的方法表现出竞争性能，这些方法需要培训带有注释面具的最新方法。

translated by 谷歌翻译

Task Ambiguity in Humans and Language Models

Alex Tamkin , Kunal Handa , Avash Shrestha , Noah Goodman

分类：自然语言处理 | 机器学习

2022-12-20

Language models have recently achieved strong performance across a wide range of NLP benchmarks. However, unlike benchmarks, real world tasks are often poorly specified, and agents must deduce the user's intended behavior from a combination of context, instructions, and examples. We investigate how both humans and models behave in the face of such task ambiguity by proposing AmbiBench, a new benchmark of six ambiguously-specified classification tasks. We evaluate humans and models on AmbiBench by seeing how well they identify the intended task using 1) instructions with varying degrees of ambiguity, and 2) different numbers of labeled examples. We find that the combination of model scaling (to 175B parameters) and training with human feedback data enables models to approach or exceed the accuracy of human participants across tasks, but that either one alone is not sufficient. In addition, we show how to dramatically improve the accuracy of language models trained without large-scale human feedback training by finetuning on a small number of ambiguous in-context examples, providing a promising direction for teaching models to generalize well in the face of ambiguity.

translated by 谷歌翻译

Bi-Level Optimization Augmented with Conditional Variational Autoencoder for Autonomous Driving in Dense Traffic

Arun Kumar Singh , Jatan Shrestha , Nicola Albarella

分类：机器人 | 机器学习

2022-12-05

Autonomous driving has a natural bi-level structure. The goal of the upper behavioural layer is to provide appropriate lane change, speeding up, and braking decisions to optimize a given driving task. However, this layer can only indirectly influence the driving efficiency through the lower-level trajectory planner, which takes in the behavioural inputs to produce motion commands. Existing sampling-based approaches do not fully exploit the strong coupling between the behavioural and planning layer. On the other hand, end-to-end Reinforcement Learning (RL) can learn a behavioural layer while incorporating feedback from the lower-level planner. However, purely data-driven approaches often fail in safety metrics in unseen environments. This paper presents a novel alternative; a parameterized bi-level optimization that jointly computes the optimal behavioural decisions and the resulting downstream trajectory. Our approach runs in real-time using a custom GPU-accelerated batch optimizer, and a Conditional Variational Autoencoder learnt warm-start strategy. Extensive simulations show that our approach outperforms state-of-the-art model predictive control and RL approaches in terms of collision rate while being competitive in driving efficiency.

translated by 谷歌翻译

Treatment classification of posterior capsular opacification (PCO) using automated ground truths

Raisha Shrestha , Waree Kongprawechnon , Teesid Leelasawassuk , Nattapon Wongcumchang , Oliver Findl , Nino Hirnschall

分类：计算机视觉

2022-11-11

Determination of treatment need of posterior capsular opacification (PCO)-- one of the most common complication of cataract surgery -- is a difficult process due to its local unavailability and the fact that treatment is provided only after PCO occurs in the central visual axis. In this paper we propose a deep learning (DL)-based method to first segment PCO images then classify the images into \textit{treatment required} and \textit{not yet required} cases in order to reduce frequent hospital visits. To train the model, we prepare a training image set with ground truths (GT) obtained from two strategies: (i) manual and (ii) automated. So, we have two models: (i) Model 1 (trained with image set containing manual GT) (ii) Model 2 (trained with image set containing automated GT). Both models when evaluated on validation image set gave Dice coefficient value greater than 0.8 and intersection-over-union (IoU) score greater than 0.67 in our experiments. Comparison between gold standard GT and segmented results from our models gave a Dice coefficient value greater than 0.7 and IoU score greater than 0.6 for both the models showing that automated ground truths can also result in generation of an efficient model. Comparison between our classification result and clinical classification shows 0.98 F2-score for outputs from both the models.

translated by 谷歌翻译

HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis Response

Selim Fekih , Nicolò Tamagnone , Benjamin Minixhofer , Ranjan Shrestha , Ximena Contla , Ewan Oglethorpe , Navid Rekabsaz

分类：自然语言处理 | 机器学习

2022-10-10

Timely and effective response to humanitarian crises requires quick and accurate analysis of large amounts of text data - a process that can highly benefit from expert-assisted NLP systems trained on validated and annotated data in the humanitarian response domain. To enable creation of such NLP systems, we introduce and release HumSet, a novel and rich multilingual dataset of humanitarian response documents annotated by experts in the humanitarian response community. The dataset provides documents in three languages (English, French, Spanish) and covers a variety of humanitarian crises from 2018 to 2021 across the globe. For each document, HUMSET provides selected snippets (entries) as well as assigned classes to each entry annotated using common humanitarian information analysis frameworks. HUMSET also provides novel and challenging entry extraction and multi-label entry classification tasks. In this paper, we take a first step towards approaching these tasks and conduct a set of experiments on Pre-trained Language Models (PLM) to establish strong baselines for future research in this domain. The dataset is available at https://blog.thedeep.io/humset/.

translated by 谷歌翻译

MobileCodec: Neural Inter-frame Video Compression on Mobile Devices

Hoang Le , Liang Zhang , Amir Said , Guillaume Sautiere , Yang Yang , Pranav Shrestha , Fei Yin , Reza Pourreza , Auke Wiggers

分类：计算机视觉

2022-07-18

由于深层网络的计算复杂性和功率约束的移动硬件的计算复杂性，因此在移动设备上实现神经视频编解码器的潜力是一项巨大的技术挑战。我们通过利用高通公司的技术和创新来证明可行性，从而弥合了从基于神经网络的编解码器模拟在壁式工作站运行的差距，再到由Snapdragon技术供电的移动设备上的实时操作。我们显示有史以来第一个在商用手机上运行的框架间神经视频解码器，实时解码高清视频，同时保持低比特率和高视觉质量。

translated by 谷歌翻译

Optimal Solutions for Joint Beamforming and Antenna Selection: From Branch and Bound to Machine Learning

Sagar Shrestha , Xiao Fu , Mingyi Hong

分类：机器学习

2022-06-11

这项工作将重新审视关节波束形成（BF）和天线选择（AS）问题，以及其在不完美的通道状态信息（CSI）下的稳健光束成型（RBF）版本。在射频链的数量（RF）链的数量小于发射器上的天线元件的情况下，出现了此类问题，这已成为大型阵列时代的关键考虑。关节（r）bf \＆作为问题是一个混合整数和非线性程序，因此发现{\ it最佳解决方案}通常是昂贵的，即使不是完全不可能。绝大多数先前的作品都使用基于连续优化的近似来解决这些问题 - 但是这些近似不能确保解决方案的最佳性甚至可行性。这项工作的主要贡献是三倍。首先，提出了一个有效的{\ it分支和绑定}（b \＆b）解决感兴趣问题的框架。利用现有的BF和RBF求解器，表明B \＆B框架保证了所考虑的问题的全球最优性。其次，为了加快潜在昂贵的B \＆B算法，提出了一种基于机器学习（ML）的方案，以帮助跳过B \＆B搜索树的中间状态。学习模型具有{\ it图形神经网络}（GNN）的设计，该设计对无线通信中通常遇到的挑战有抵抗力，即，培训和测试中问题大小的变化（例如，用户数量）的变化（例如，用户数量）阶段。第三，提出了全面的性能特征，表明基于GNN的方法在合理的条件下保留了B \＆B的全球最佳性，其复杂性可降低。数值模拟还表明，基于ML的加速度通常可以相对于B \＆b实现速度的速度。

translated by 谷歌翻译

OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses

Robik Shrestha , Kushal Kafle , Christopher Kanan

分类：机器学习

2022-04-05

数据集偏见和虚假相关性可能会严重损害深层神经网络中的概括。许多先前的努力已经使用替代性损失功能或集中在稀有模式上的采样策略来解决此问题。我们提出了一个新的方向：修改网络体系结构以施加归纳偏见，从而使网络对数据集偏置进行鲁棒性。具体而言，我们提出了OCCAMNET，这些OCCAMNET有偏见以通过设计偏爱更简单的解决方案。 OCCAMNET具有两个电感偏见。首先，他们有偏见地使用单个示例所需的网络深度。其次，它们偏向使用更少的图像位置进行预测。尽管Occamnets偏向更简单的假设，但必要时可以学习更多复杂的假设。在实验中，OCCAMNET的表现优于或竞争对手的最先进方法在不包含这些电感偏见的体系结构上运行。此外，我们证明，当最先进的伪造方法与OCCAMNETS结合使用时，结果进一步改善。

translated by 谷歌翻译

A Real World Dataset for Multi-view 3D Reconstruction

Rakesh Shrestha , Siqi Hu , Minghao Gou , Ziyuan Liu , Ping Tan

分类：计算机视觉 | 人工智能

2022-03-22

我们介绍了日常桌面对象的998 3D型号的数据集及其847,000个现实世界RGB和深度图像。每个图像的相机姿势和对象姿势的准确注释都以半自动化方式执行，以促进将数据集用于多种3D应用程序，例如形状重建，对象姿势估计，形状检索等。3D重建由于缺乏适当的现实世界基准来完成该任务，并证明我们的数据集可以填补该空白。整个注释数据集以及注释工具和评估基线的源代码可在http://www.ocrtoc.org/3d-reconstruction.html上获得。

translated by 谷歌翻译

Spectrum Surveying: Active Radio Map Estimation with Autonomous UAVs

Raju Shrestha , Daniel Romero , Sundeep Prabhakar Chepuri

分类：机器学习

2022-01-11

无线电贴图在无线通信和移动机器人任务中找到了许多应用，包括资源分配，干扰协调和任务规划。尽管已经提出了许多技术来构造来自空间分布测量的无线电映射，但是预先假定了这种测量的位置的位置。相反，本文提出了频谱测量，其中诸如无人航空车辆（UAV）的移动机器人在主动选择的一组位置处收集测量以在短测量时间内获得高质量地图估计。这是以两步执行的。首先，设计了两种新颖的算法，基于模型的在线贝叶斯估计器和数据驱动的深度学习算法，以更新地图估计和指示每个可能位置的测量信息的信息性。这些算法提供互补的益处，并且每次测量都具有恒定的复杂性。其次，不确定度量用于规划无人机的轨迹，以在最具信息地的位置收集测量。为了克服这个问题的组合复杂性，提出了一种动态编程方法，以通过线性时间的大不确定性的区域获取航路点列表。在现实数据集上进行的数值实验证实了所提出的方案快速构建精确的无线电贴图。

translated by 谷歌翻译